46 research outputs found
Improving lightly supervised training for broadcast transcription
This paper investigates improving lightly supervised acoustic
model training for an archive of broadcast data. Standard
lightly supervised training uses automatically derived decoding
hypotheses using a biased language model. However, as the
actual speech can deviate significantly from the original programme
scripts that are supplied, the quality of standard lightly
supervised hypotheses can be poor. To address this issue, word
and segment level combination approaches are used between
the lightly supervised transcripts and the original programme
scripts which yield improved transcriptions. Experimental results
show that systems trained using these improved transcriptions
consistently outperform those trained using only the original
lightly supervised decoding hypotheses. This is shown to be
the case for both the maximum likelihood and minimum phone
error trained systems.The research leading to these results was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).This is the accepted manuscript version. The final version is available at http://www.isca-speech.org/archive/interspeech_2013/i13_2187.html
Automatic transcription of multi-genre media archives
This paper describes some recent results of our collaborative work on
developing a speech recognition system for the automatic transcription
or media archives from the British Broadcasting Corporation (BBC). The
material includes a wide diversity of shows with their associated
metadata. The latter are highly diverse in terms of completeness,
reliability and accuracy. First, we investigate how to improve lightly
supervised acoustic training, when timestamp information is inaccurate
and when speech deviates significantly from the transcription, and how
to perform evaluations when no reference transcripts are available.
An automatic timestamp correction method as well as a word and segment
level combination approaches between the lightly supervised transcripts
and the original programme scripts are presented which yield improved
metadata. Experimental results show that systems trained using the
improved metadata consistently outperform those trained with only the
original lightly supervised decoding hypotheses. Secondly, we show that
the recognition task may benefit from systems trained on a combination
of in-domain and out-of-domain data. Working with tandem HMMs, we
describe Multi-level Adaptive Networks, a novel technique for
incorporating information from out-of domain posterior features using
deep neural network. We show that it provides a substantial reduction in
WER over other systems including a PLP-based baseline, in-domain tandem
features, and the best out-of-domain tandem features.This research was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).This paper was presented at the First Workshop on Speech, Language and Audio in Multimedia, August 22-23, 2013; Marseille. It was published in CEUR Workshop Proceedings at http://ceur-ws.org/Vol-1012/
The experience of admission to psychiatric hospital among Chinese adult patients in Hong Kong
<p>Abstract</p> <p>Background</p> <p>The paper reports on a study to evaluate the psychometric properties and cultural appropriateness of the Chinese translation of the Admission Experience Survey (AES).</p> <p>Methods</p> <p>The AES was translated into Chinese and back-translated. Content validity was established by focus groups and expert panel review. The Chinese version of the Admission Experience Survey (C-AES) was administered to 135 consecutively recruited adult psychiatric patients in the Castle Peak Hospital (Hong Kong SAR, China) within 48 hours of admission. Construct validity was assessed by comparing the scores from patients admitted voluntarily versus patients committed involuntarily, and those received physical or chemical restraint versus those who did not. The relationship between admission experience and psychopathology was examined by correlating C-AES scores with the Brief Psychiatric Rating Scale (BPRS) scores.</p> <p>Results</p> <p>Spearman's item-to-total correlations of the C-AES ranged from 0.50 to 0.74. Three factors from the C-AES were extracted using factor analysis. Item 12 was omitted because of poor internal consistency and factor loading. The factor structure of the Process Exclusion Scale (C-PES) corresponded to the English version, while some discrepancies were noted in the Perceived Coercion Scale (C-PCS) and the Negative Pressure Scale (C-NPS). All subscales had good internal consistencies. Scores were significantly higher for patients either committed involuntarily or subjected to chemical or physical restrain, independent on severity of psychotic symptoms.</p> <p>Conclusion</p> <p>The Chinese AES is a psychometrically sound instrument assessing the three different aspects of the experience of admission, namely "negative pressure, "process exclusion" and "perceived coercion". The potential of C-AES in exploring subjective experience of psychiatric admission and effects on treatment adherence should be further explored.</p
Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion
The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2
(TEC2012-37585-C02-01) from the Spanish Ministry of Economy and
Competitiveness. This research was also funded by the European Regional
Development Fund, the Galician Regional Government (GRC2014/024,
“Consolidation of Research Units: AtlantTIC Project” CN2012/160)
Lens stem cells may reside outside the lens capsule: an hypothesis
In this paper, we consider the ocular lens in the context of contemporary developments in biological ideas. We attempt to reconcile lens biology with stem cell concepts and a dearth of lens tumors
“A Hideous Torture on Himself”: Madness and Self-Mutilation in Victorian Literature
This paper suggests that late nineteenth-century definitions of self-mutilation, a new category of psychiatric symptomatology, were heavily influenced by the use of selfinjury as a rhetorical device in the novel, for the literary text held a high status in Victorian psychology. In exploring Dimmesdale’s “self-mutilation” in The Scarlet Letter in conjunction with psychiatric case histories, the paper indicates a number of common techniques and themes in literary and psychiatric texts. As well as illuminating key elements of nineteenth-century conceptions of the self, and the relation of mind and body through ideas of madness, this exploration also serves to highlight the social commentary implicit in many Victorian medical texts. Late nineteenth-century England, like mid-century New England, required the individual to help himself and, simultaneously, others; personal charity and individual philanthropy were encouraged, while state intervention was often presented as dubious. In both novel and psychiatric text, self-mutilation is thus presented as the ultimate act of selfish preoccupation, particularly in cases on the “borderlands” of insanity
Using sub-word-level information for confidence estimation with conditional random field models
The task of word-level confidence estimation (CE) for automatic speech recognition (ASR) systems stands to benefit from the combination of suitably defined input features from multiple information sources. However, the information sources of interest may not necessarily operate at the same level of granularity as the underlying ASR system. The research described here builds on previous work on confidence estimation for ASR systems using features extracted from word-level recognition lattices, by incorporating information at the sub-word level. Furthermore, the use of Conditional Random Fields (CRFs) with hidden states is investigated as a technique to combine information for word-level CE. Performance improvements are shown using the sub-word-level information in linear-chain CRFs with appropriately engineered feature functions, as well as when applying the hidden-state CRF model at the word level